Direct S3 Data Access - Rough PODAAC ECCO SSH Example


Import Required Packages

%matplotlib inline
import matplotlib.pyplot as plt
from datetime import datetime
import os
import subprocess
import requests
import boto3
import s3fs
import pandas as pd
import numpy as np
import xarray as xr
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
import rioxarray
import geopandas
import pyproj
from pyproj import Proj
from shapely.ops import transform
import geoviews as gv
from cartopy import crs
import hvplot.xarray
import holoviews as hv
gv.extension('bokeh', 'matplotlib')
from pystac_client import Client
s3_cred_endpoint = {
    'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
    'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'
}
def get_temp_creds():
    temp_creds_url = s3_cred_endpoint['podaac']
    return requests.get(temp_creds_url).json()
temp_creds_req = get_temp_creds()
session = boto3.Session(aws_access_key_id=temp_creds_req['accessKeyId'], 
                        aws_secret_access_key=temp_creds_req['secretAccessKey'],
                        aws_session_token=temp_creds_req['sessionToken'],
                        region_name='us-west-2')
rio_env = rio.Env(AWSSession(session),
                  GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR',
                  GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'))
rio_env.__enter__()
<rasterio.env.Env at 0x7f7fcb8b9bb0>
podaac_cat = Client.open('https://cmr.earthdata.nasa.gov/stac/POCLOUD/')
search = podaac_cat.search(
    collections=['ECCO_L4_SSH_05DEG_MONTHLY_V4R4'],
    datetime='2015'
)
search.matched()
13
items = search.get_all_items()
list(items)
[<Item id=SEA_SURFACE_HEIGHT_mon_mean_2014-12_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-01_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-02_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-03_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-04_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-05_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-06_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-07_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-08_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-09_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-10_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-11_ECCO_V4r4_latlon_0p50deg>,
 <Item id=SEA_SURFACE_HEIGHT_mon_mean_2015-12_ECCO_V4r4_latlon_0p50deg>]
ssh_https = items[1].get_assets()['data'].href
ssh_https
'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-01_ECCO_V4r4_latlon_0p50deg.nc'
ssh_s3 = ssh_https.replace('https://archive.podaac.earthdata.nasa.gov/', 's3://')
ssh_s3
's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-01_ECCO_V4r4_latlon_0p50deg.nc'

Single file in-region direct S3 access of netcdf file

fs_s3 = s3fs.S3FileSystem(anon=False, key=temp_creds_req['accessKeyId'], secret=temp_creds_req['secretAccessKey'], token=temp_creds_req['sessionToken'])
s3_file_obj = fs_s3.open(ssh_s3, mode='rb')
ssh_xr = xr.open_dataset(s3_file_obj, engine='h5netcdf')
ssh_xr
<xarray.Dataset>
Dimensions:         (time: 1, latitude: 360, longitude: 720, nv: 2)
Coordinates:
  * time            (time) datetime64[ns] 2015-01-16T12:00:00
  * latitude        (latitude) float32 -89.75 -89.25 -88.75 ... 89.25 89.75
  * longitude       (longitude) float32 -179.8 -179.2 -178.8 ... 179.2 179.8
    time_bnds       (time, nv) datetime64[ns] 2015-01-01 2015-02-01
    latitude_bnds   (latitude, nv) float32 -90.0 -89.5 -89.5 ... 89.5 89.5 90.0
    longitude_bnds  (longitude, nv) float32 -180.0 -179.5 -179.5 ... 179.5 180.0
Dimensions without coordinates: nv
Data variables:
    SSH             (time, latitude, longitude) float32 ...
    SSHIBC          (time, latitude, longitude) float32 ...
    SSHNOIBC        (time, latitude, longitude) float32 ...
Attributes: (12/57)
    acknowledgement:              This research was carried out by the Jet Pr...
    author:                       Ian Fenty and Ou Wang
    cdm_data_type:                Grid
    comment:                      Fields provided on a regular lat-lon grid. ...
    Conventions:                  CF-1.8, ACDD-1.3
    coordinates_comment:          Note: the global 'coordinates' attribute de...
    ...                           ...
    time_coverage_duration:       P1M
    time_coverage_end:            2015-02-01T00:00:00
    time_coverage_resolution:     P1M
    time_coverage_start:          2015-01-01T00:00:00
    title:                        ECCO Sea Surface Height - Monthly Mean 0.5 ...
    uuid:                         088d03b8-4158-11eb-876b-0cc47a3f47f1

Multi-file in-region direct S3 access of netcdf files

ssh_https_urls = [x.get_assets()['data'].href for x in items]
ssh_https_urls
['https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2014-12_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-01_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-02_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-03_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-04_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-05_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-06_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-07_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-08_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-09_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-10_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-11_ECCO_V4r4_latlon_0p50deg.nc',
 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-12_ECCO_V4r4_latlon_0p50deg.nc']
ssh_s3_urls = [x.replace('https://archive.podaac.earthdata.nasa.gov/', 's3://') for x in ssh_https_urls]
ssh_s3_urls
['s3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2014-12_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-01_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-02_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-03_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-04_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-05_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-06_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-07_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-08_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-09_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-10_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-11_ECCO_V4r4_latlon_0p50deg.nc',
 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-12_ECCO_V4r4_latlon_0p50deg.nc']
# Iterate through remote_files to create a fileset
fileset = [fs_s3.open(file) for file in ssh_s3_urls]
# This works
ssh_xr_ts = xr.open_mfdataset(fileset, engine='h5netcdf')
ssh_xr_ts
<xarray.Dataset>
Dimensions:         (time: 13, latitude: 360, longitude: 720, nv: 2)
Coordinates:
  * time            (time) datetime64[ns] 2014-12-16T12:00:00 ... 2015-12-16T...
  * latitude        (latitude) float32 -89.75 -89.25 -88.75 ... 89.25 89.75
  * longitude       (longitude) float32 -179.8 -179.2 -178.8 ... 179.2 179.8
    time_bnds       (time, nv) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray>
    latitude_bnds   (latitude, nv) float32 dask.array<chunksize=(360, 2), meta=np.ndarray>
    longitude_bnds  (longitude, nv) float32 dask.array<chunksize=(720, 2), meta=np.ndarray>
Dimensions without coordinates: nv
Data variables:
    SSH             (time, latitude, longitude) float32 dask.array<chunksize=(1, 360, 720), meta=np.ndarray>
    SSHIBC          (time, latitude, longitude) float32 dask.array<chunksize=(1, 360, 720), meta=np.ndarray>
    SSHNOIBC        (time, latitude, longitude) float32 dask.array<chunksize=(1, 360, 720), meta=np.ndarray>
Attributes: (12/57)
    acknowledgement:              This research was carried out by the Jet Pr...
    author:                       Ian Fenty and Ou Wang
    cdm_data_type:                Grid
    comment:                      Fields provided on a regular lat-lon grid. ...
    Conventions:                  CF-1.8, ACDD-1.3
    coordinates_comment:          Note: the global 'coordinates' attribute de...
    ...                           ...
    time_coverage_duration:       P1M
    time_coverage_end:            2015-01-01T00:00:00
    time_coverage_resolution:     P1M
    time_coverage_start:          2014-12-01T00:00:00
    title:                        ECCO Sea Surface Height - Monthly Mean 0.5 ...
    uuid:                         08a2fc68-4158-11eb-b498-0cc47a3f6943
ssh_xr_ts.SSH.hvplot.image()

Single file access via rioxarray - not pretty

ssh = rioxarray.open_rasterio(ssh_s3)

Just a warning, but a very obnoxious one.

Returns a list of xarray data array???

ssh[0]
ssh[0].SSH
ssh[0].SSH.where(ssh[0].SSH < 9).hvplot.image(x='x', y='y')